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ABSTRACT 

We describe a hybrid Direct Simulation Monte Carlo (DSMC) code for simultaneously 
solving the collisional Boltzmann equation for gas and the collisionless Boltzmann 
equation for stars and dark matter for problems important to galaxy evolution. This 
project is motivated by the need to understand the controlling dynamics at interfaces 
between gases of widely differing densities and temperature, i.e. multiphase media. 
While more expensive than hydrodynamics, the kinetic approach does not suffer from 
discontinuities and it applies when the continuum limit does not, such as in the collapse 
of galaxy clusters and at the interface between coronal halo gas and a thin neutral gas 
layer. Finally, the momentum flux is carried, self-consistently, by particles and this 
approach explicitly resolves and thereby 'captures' shocks. 

The DSMC method splits the solution into two pieces: 1) the evolution of the 
phase-space flow without collisions; and 2) the evolution governed the collision term 
alone without phase-space flow. This splitting approach makes DSMC an ideal match 
to existing particle-based n-body codes. If the mean free path becomes very small 
compared to any scale of interest, the method abandons simulated particle collisions 
and simply adopts the relaxed solution in each interaction cell consistent with the 
overall energy and momentum fluxes. This is functionally equivalent to solving the 
Navier-Stokes equations on a mesh. Our implementation is tested using the Sod shock 
tube problem and the non-linear development of an Kelvin-Helmholtz unstable shear 
layer. 

Key words: hydrodynamics — atomic processes — methods: numerical — galaxies: 
ISM — ISM: structure, evolution 



1 INTRODUCTION 

1.1 Motivation 

The burgeoning volume of multiwavelength galaxy obser- 
vations reveals an interactive complexity of patterns that 
couple merging, environmental interaction, enhanc ed epochs 
of sta r-formation, and gas accretion histories (e.g. lMo et al.l 
l20ld ). Each mass component — dark matter, stellar and com- 
pact objects, atomic and molecular gas — is sensitive to dif- 
ferent although overlapping ranges of length and time scales. 
Key to piecing together these complex interactions is an un- 
derstanding of the mutual dynamical evolution of each com- 
ponent. For gas in particular, winds, accretion flows and 
other coUisionally-induced structures often produce phase 
interfaces at shocks. Not only do the density enhancements 
at these shocks result in radiative cooling and instabilities. 
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these enhancements provide gravitational handles for mo- 
mentum transfer and torques. 

More generally, galaxies are particularly sensitive to the 
conditions in transitional regimes. These transitions occur in 
density (e.g. between galaxy components), gas temperature 
and gravitational acceleration. For example, on the largest 
scales, the standard model predicts that gravitational poten- 
tial of a galaxy is dominated by dark matter. The baryons 
in the halo are rarefied with a temperature characteristic 
of the escape velocity from the halo, approximately 10^ de- 
grees K. Moving inwards, there is a transition between the 
baryon-dominated stellar and gaseous galaxy and the colli- 
sionless dark halo. In this region, the gravitational dynam- 
ics of both components may conspire to produce structure. 
The increased gas density shortens the atomic and molec- 
ular cooling time, producing a dense cold gas layer. This 
implies that there must be an interface of coexistence be- 
tween the discrete rarefied and dense phases. Finally, each 
of these gas phases and the gravitationally dominant col- 
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lisionless components has a characteristic length and time 
scale that produces a range of accelerations. 

These transitions regions provide tests of the standard 
and alternative galaxy formation hypotheses. In particular, 
numerical simulations based on the standard ACDM sce- 
nario have made a variety of predictions on small galac- 
tic scales with both successes and failures. However, first- 
principle physical simulations are often infeasible at the 
scales where many of the failures occur, so at this point, the 
failures may either intrinsic or methodological. Alternately, 
some propose that the root cause of the failures may be the 
nature of gravity itself and that modified Newtonian dynam- 
ics (MOND) produces a better prediction of observed rota- 
tions curves than standard ACDM. Since the outer galaxy 
will be dominated by dark matter in the standard scenario 
or by non-Newtonian forces in MOND, the response in the 
outer galaxy to the dynamics may provide a sensitive test 
for these hypotheses. In addition, the temporal differences 
between the coUisionless and coUisional responses of media 
within each galaxy component provides clues to the history 
of a galaxy's evolution that depend on the underlying cos- 
mogony. For example, the gas responds over several hundred 
million years to a disturbance in the stellar and dark-matter 
response to an external perturbation taking place over gi- 
gayears. The observation of both responses together can pro- 
vide key diagnostics to the evolutionary history and possibly 
the underlying physics. 

Unfortunately, these transition regions are difficult to 
simulate reliably. Therefore, the most productive use of the 
combined DSMC-n-body hybrid code will be the investi- 
gation of interface dynamics and energetics on small and 
intermediate scales. An experimental ver sion of our code 
uses the full CHI ANtQ atomic database (iDere et al.lll997l : 
iLandi et aLlbOllr ) for coUisional cross sections and recombi- 
nation (bound-bound and bound-free) and standard plasma 
transitions (free-free) for the ionized regime. The current 
version does not treat the electrons as a separate kinetic 
species, but rather assumes that they follow the ions. This 
restriction will be relaxed in a future version of this code 
and this will allow a fully self-consistent treatment of elec- 
tron conduction and allow the dynamical influence of fixed 
magnetic fields to be investigated. DSMC cou ld be incorpo- 
rated into a full-fledged PIC plasma code (e.g. lSerikov et al.l 
Il999f ) but this is well beyond the scope of our current im- 
plementation. 

Our main p urpose for this and the companion paper 
l|Weinbera2013l . hereafter Paper 2) is a demonstration of the 
value of the kinetic approach for astrophysical flows by ap- 
plying it to some classic scenarios. To facilitate this compar- 
ison, we use the standard, simplified local thermodynamic 
equilibrium (LTE) scheme (e.g. [Black 1981) rather than the 
fuU-fiedged self-consistent cross-section based approach. We 
will begin, in the next section, with a very brief overview of 
the different regimes for gas dynamics from rarefied to dense. 
This will motivate the need for an understanding of gas in 
the transitional regime. Section [3] describes the numerical 
approach in two parts: t|3.1l introduces the DSMC algorithm 
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and a hybrid variant that exploits the near-equilibrium solu- 
tion when the mean-free path is very short, and i]3.2l briefiv 
reviews the n-body code, describes the imple mentation of 
the DSMC algorithm in parallel using MPI IjGropp et al.l 
1 19991 : ICabriel et al.ll2004r ) and presents diagnostics necessary 
for parameter tuning. We present two code tests in 33] the 
standard one-dimensional shock tube (" t|4.1|l and the recov- 
ery of the Kelvin-Helmholtz instability ( i|4.2|l . We conclude 
in Sj5]with a discussion and summary. 



2 GAS SIMULATION 

Gas dynamics in galaxies is most often simulated through 
numerical solutions of the Navier-Stokes equations: 



'9v „ 

P1-+V.VV 



-VP -f- V ■ T 4- F, 



(1) 



where v is the flow velocity, p is the fluid density, P is the 
pressure, T is the stress tensor, and F represents body forces 
(per unit volume) acting on the fluid. In essence, this equa- 
tion is an application of Newton's second law to a continuum 
and is most often derived from this point of view. However, 
to truly begin with Newton's Laws requires the application 
of kinetic theo ry with the Boltzmann equation as a starting 
point (e.g. see lFritdliool ). 

The Navier-Stokes equations (eq. [T]) are rife with shock 
discontinuities in general and are notoriously difficult to 
solve. Physically, the width of the shock interfaces are of or- 
der the mean-free path, and therefore are formally inconsis- 
tent with the continuum approximation. On the other hand, 
these shock discontinuities are responsible for driving im- 
portant astrophysical phenomena on many scales (thermal 
heating, chemistry and radiation, turbulence, to name a few) 
and must be treated carefully. Because of this, much of the 
difficult work in computational fiuid dynamics concerns the 
approximations at interfaces. For example, grid and finite 
element-schemes use shock-capturing methods to stabilize 
the solution in presence of discontinuities. These methods of- 
ten introduce numerical dissipation to achieve stability and 
add some width to the discontinuity prevent numerically- 
induced oscillations. In the case of discontinuous Galerkin 
methods, one may use order reduction or limiting to pre- 
vent spurious oscillations near discontinuities. In smoothed 
particle hydrodynamics, artificial viscosity is used to prevent 
instability and seems to yield smoother contact discontinu- 
ities and shocks than grid based solvers. Spectral methods, 
which project the fiuid equations onto basis functions (e.g. 
spherical harmonics, Chebyshev polynomials), yield high- 
accuracy solutions but one must be even more careful at 
interfaces. Current state of the art in spectral methods is to 
use high-accuracy shock-fitting algorithms. 

A more general formulation of gas dynamics follows 
from the coUisional Boltzmann equation, 



a/ 
dt 



+ v-Vx/ + F-Vv,f = Q(/,,f) 



(2) 



which describes the change to the phase-space density, /, 
induced by the collisions between particles, Q{f,f)- Since / 
has a six-dimensional domain and the fields in the Navier- 
Stokes equation have a three-dimensional domain, equation 
((2)1 appears harder and certainly more time consuming to 
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Figure 1. Gas regimes as a function of atomic density in 
atoms/cc and characteristic scale in parsecs. The Kn = 1 line 
is shown in red. The shaded blue region is the transition region. 
For Kn > 10, the flow is strongly in the kinetic regime, where 
collisions and excitation may be important overall but rare. For 
Kn < 0.04, the continuum limit, e.g. the Navier-Stokes equation, 
is appropriate. A number of classic astronomical interface regimes 
are shown in labeled boxes. 



solve than equation ^. However, the high level of algo- 
rithmic complexity in computational fluid dynamics (CFD) 
follows from the mathematical requirements that arise from 
the maintenance of the continuum limit. On the other hand, 
the left-hand side of equation ((2| is the coUisionless part 
of the Boltzmann that may be readily solved using n-body 
techniques while the left-hand describes the particle colli- 
sions. Unlike the equations of hydrodynamics, the coUisional 
Boltzmann equation has no problem with transition regimes 
or discontinuities since the information about the collisions 
are carried ballistically rather than by constitutive relation- 
ships. Said another way, the continuum limit will often fail 
by deflnition at transition regimes while the kinetic approach 
represents the physical nature of the transition and cannot 
fail. The price for this generalization is performance: the ki- 
netic approach requires smaller timescales and length scales. 
Solutions of the coUisional Boltzmann equation are often an 
order of magnitude slower or more than hydrodynamic so- 
lutions for the same problem. 

Given the difficulties in solving equation ([1} and its in- 
appropriateness for true transitional regimes, it is worth ex- 
ploring alternatives. Direct solution of the coUisional Boltz- 
mann equation ([2} using a kinetic simulation method is 
attractive for a number of reasons. The simplest kinetic 
method, molecular dynamics, solves the full n-body system 
including the collisions directly, at often great expense. On 
the other hand, it is straightforward to incorporate any num- 
ber of species and specific physical couplings that would be 
difficult in CFD, and generalization to multiple species and 
interactions is straightforward. Several alternatives to pure 
molecular dynamics exist; each achieves computational ef- 
ficiency by approximating some aspect of equation ([2} . For 
example, lattice Boltzmann methods (LBM) discretize and 
solve equation ((21) on a grid and naturally yield the Navier- 
Stokes equation in the small mean-free path limit. This is 



a form of mesoscopic solution is designed to produce the 
correct solution on scales larger than the particle scale but 
smaller than the continuum macroscopic scale. The final ex- 
ample of a mesoscopic solution is Monte Carlo solution of 
Boltzmann equation. This approach exploits the classical 
indeterminacy of the coUisional trajectories on the mean- 
free-path scale. The effects of the collisions are incorpo- 
rated with a Monte-Carlo procedure that reproduces the 
per-particle cross sections that affect the flow at intermedi- 
ate length scales. This method of solution may be one of the 
few practical approaches available to understand multiscale 
phenomena such as turbulence. 

Some gas flows are not fluids and must be treated by a 
kinetic-theory approach. The nature of the flow is described 
in kinetic theory by the ratio of the mean free path to the 
characteristic scale, called Knudsen number or Kn: 



Knudsen number = Kn ; 



Mean free path A 

Characteristic size L 
In astrophysics, typical characteristic scales include den- 
sity, temperature, and gravitational-field scale lengths. For 
Kn > 0.05, the solutions of the cont inuum fluid equa - 
tions deviate from the exact solutions IjBovd et al.lll995l ). 
It should not be surprising that many of the most impor- 
tant astrophysical regimes are near this boundary, as shown 
in Figure [T] The knowledge of the physical state within the 
transition regime is necessary for predicting the physical pro- 
cesses that may dominate the observed emission or cooling 
even if Kn is small elsewhere. In other regimes, the con- 
tinuum limit is not appropriate even without a transition 
region or discontinuity. 



3 METHOD 



The Direct Simulation Monte Carlo (DSMC, see lBirdlHogJ ) 
method has been widely in engineering applications for gas 
flows beyond the continuum limit. DSMC is a numerical 
method, originally conceived for modeling rarefied gas par- 
ticles with mean free paths the same order or greater than 
the characteristic physical length scale (i.e. Kn > 1). In such 
rarefied flows, the Navier-Stokes equations can be inaccu- 
rate. But recently, the DSMC method has been extended to 
model near continuum flows (Kn <S 0.05), making it appro- 
priate for astrophysical problems with large dynamic ranges 
and multiple phases. In addition, DSMC is fully shock cap- 
turing in the sense that the discontinuity implied by the 
shock is naturally determined by DSMC. It reproduces stan- 
dard shock tests (see M.l^ and instabilities such as Kelvin- 
Helmholtz (see tj4.2|) . Because the method works for arbi- 
trary values of Kn, DSMC may be used in astrophysical 
problems that have short and long mean free paths that 
might occur in the interaction between cold gas and hot 
rarefied gas in cluster environments (e.g. see Paper 2 for an 
application to ram-pressure stripping). 

Furthermore, DSMC is always stable (no Courant- 
Friedrichs-Lewy condition). As will be described in i]3.1.2l 
there are conditions on the step size and collision param- 
eters for optimal performance, but poor parameter choices 
lead to some inaccuracy but not failure through instability. 
The main problem with DSMC approach is computational 
speed: for a point of comparison, DSMC is more efficient 
per particle than a smoothed-particle hydrodynamics (SPH) 
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code, but it requires at least an order of magnitude more 
particles to achieve a similar resolution. The performance 
in our implementation is bottlenecked by the tree structure 
supporting the parallel domain decomposition and thereby 
marks this area for further technical development. Nonethe- 
less, DSMC will never compete in speed and accuracy with 
hydrodynamic methods in the Kn — >■ limit. Rather, the 
goal of the DSMC method is an investigation of the effect 
of microphysics and gas dynamics in the transition regimes 
themselves; these regimes cannot be studied within the hy- 
drodynamic paradigm. 

3.1 Introduction to Direct Simulation Monte 
Carlo 

DSMC incorporates the internal degrees of freedom of the 
atoms and molecules in a gas through a variety of approx- 
imations that redistribute the kinetic energy, momentum 
and internal energy of two collision partners. For a sim- 
ple monoatomic gas, the only relevant energies are those 
of translation, electronic excitation and radiative emission 
by the particles. This system is described by a phase-space 
distribution function, /fe(x, v; e^, i), that represents the ex- 
pected number density of molecules of species fc in a small 
volume dx about the point at x which have a velocity in the 
range v to v -I- dv, and internal energy e^ a given instant t. 

The limiting kinetic equation for DSMC is the nonlinear 
collisional Boltzmann equation (Wagner 1992). DSMC splits 
the simulated solution of the collisional Boltzmann equation 
into two sequentially-applied parts. First, the particles are 
first advanced on colhsionless trajectories using the standard 
n-body method. Second, the flow field is divided into cells, 
and collisions are between the simulation particles realized 
consistent with the local collision rate. DSMC represents 
the atoms and molecules of a gas by much smaller number 
of simulation particles. The number of collisions reproduces 
the rate in the physical system by increasing the cross sec- 
tion for the simulation particles by the ratio of the simu- 
lation mass to the true mean atomic and molecular mass. 
This generic feature is exploited in a variety of ways to im- 
prove DSMC performance. For example, if the collision rate 
is sufficiently large, the velocity distribution in a cell will ap- 
proach the equilibrium Maxwell-Boltzmann distribution and 
this may be used to limit the number of computed collisions 
explicitly. If the collision rate is low, however, there will be 
little redistribution of energy and momentum and the dis- 
tribution will remain close to the colhsionless solution which 
may be far from thermodynamic equilibrium. The details of 
the algorithm will be outlined in >l3.1.1l below. 

DSMC is presently the most widely used numerical algo- 
rithm in kinetic theory (Garcia & Wagner 2000). In DSMC, 
particle pairs are randomly chosen to collide according to 
the probability distribution implied by the interparticle po- 
tential. This probability is proportional to the particles' rel- 
ative speed and effective geometric cross section. The post- 
collision velocities are determined by randomly selecting the 
collision angles and redistributing the energy into atomic 
and molecular internal degrees of freedom. Therefore, unlike 
molecular dynamics, DSMC particles are chosen to collide 
even if their actual trajectories do not overlap. This is not 
an inconsistency but required by the probabilistic nature of 
the solution. 



The application of the classic DSMC algorithm is re- 
stricted to dilute gases. Recently, the Consistent Boltzmann 
Algorithm (CBA) was i ntroduced as a simpl e variant of 
DSMC for dense gases (j Alexander et all 1 19951 ). The CBA 
collision algorithm follows the DSMC algorithm with two 
additions. First, the unit vector parallel to the line con- 
necting the centers at impact is computed from the pre- 
and post-collision velocities of the colliding pair. Each par- 
ticle is displaced a along this direction corresponding to the 
mean separation they would have experienced had they col- 
lided as hard spheres. Secondly, the collision rate must be 
increased over the dilute rate to account for the volume 
displaced by the hard spheres representing the atoms and 
molecules themselves. These algorithmic changes are un- 
likely to play a major role in astrophysical flows, but are 
trivial to include. With these two simple additions, CBA 
yields the hard sphe r e equ ation of state at all densities. 
iMontanero fc Santod (| 19961 ) showed that the high-density 
transport properties are not exact but remain comparable 
to with molecular dynamics simulations and the Enskog ap- 
proximation. Although CBA can be generalized to any equa- 
tion of state here we will only consider the hard sphere gas 
whose particle diameter is a constant fraction of the Bohr 
diameter. 



3.1.1 Details of the DSMC Algorithm 

The two sequential steps that simulate equation ((2]) are a 
ballistic or colhsionless computation based on the left-hand 
side and a collisional computation based on the right hand 
side. Practically, one advances the particles according the 
colhsionless Boltzmann equation, 

dl 

dt 

using standard n-body techniques. One then solves for the 
collisions using 



v-Vx/ + F-Vv/ = 



(4) 



f = Q(/„f). 



(5) 



The collision operator is 



QifJ) 



k3 Js 



cr(|v- v,|,Sl)|v - V,| X 



[./(v')./(v:)-,/(v)/(v.)] dQdv, 



(6) 



where o-(-, •) denotes the collision cross section, the primed 
v' and v', describe all the possible post-collisional velocities 
of two particles colliding with respective pre-collisional ve- 
locities V and V,, and Q denotes the interaction angles. In 
DSMC, the collision operator is solved by a Monte Carlo 
realization of equation ((6]) as follows: 

(i) Move all of the particles coUisionlessly according to 
the mean field (eq. |4]) using the n-body solver. 

(ii) Partition particles into cells whose linear scale is of 
order the mean free path A. 

(iii) Compute collision frequency in a cell for all parti- 
cle species and interactions of interest. This discretizes the 
spatial dependence of the phase-space density / to provide 
/(v). 

(iv) Select random collision partners within cell; this sam- 
ples /(v) and ,f(v*) in equation ©. We assume that prob- 
ability that a pair collides only depends on their relative 
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velocity. That is, all particles in the cell are valid collision 
partners. The post-collision velocities (6 quantities) are con- 
strained by the conservation of momentum (3 constraints) 
and energy (1 constraint). The post-collision direction in the 
center of mass frame is specified by two randomly chosen 
variates. 

(v) Repeat these steps beginning with Step (i). 

In this way, the net change in the phase-space density im- 
plied by equations ^ and © are computed by evolving ve- 
locity distributions consistent with the interaction potential 
implicit in the cross section a. This, in turn, leads to mass, 
mom e ntum , and energy flux throu gh the collision cells. See 
iBirdI (|l994l ') and ICercignanil (|200Cll ) for additional practical 
details and theoretical underpinning of this approach. 

3.1.2 Algorithm ejjiciency and tuning 

DSMC is most efficient and accurate whe n the interaction 
cell size / is of order the mean free path A (|Sun et al.ll201ll ) 
and a free particle crosses the cell in a time step {vAt « I 
where v is the mean one-dimensional particle velocity and 
At is the time step). This motivates defining two scale-free 
quantities: 

ex = \/l, (7) 

€fi = vAt/l. (8) 

The cell size I is chosen to satisfy two additional constraints: 

1) / should be smaller than any flow scale of interest; and 

2) the cell size should enclose approximately 10 particles. 
Clearly, if the cell size is so small that the probability of oc- 
cupation is small, no interactions are possible. Conversely, 
if the occupation number is large, the expense may in- 
crease without improving the accuracy of the result. This 
parametrization leads to the following limiting scenarios and 
tuning prescriptions: 

• ex ~0(l),e/, > 1. 

The some particles will pass through multiple cells in one 
time step, possibly leading to artificial viscosity (see i]3.1.4p . 
The remedy is to decrease step size, At. 

• eA~0(l),e/;<l. 

Particles can not reach their interaction partners in one time 
step. This leads to excess correlation. The remedy is to 
increase step size. At. In our DSMC implementation, the 
appropriate time step is selected automatically selected for 
each gas particle based on these criteria as long as the times 
step criteria demanded by the Poisson solver (eqs. I23H25|) 
are not violated. 

• eA>l,e/i~C'(l). 

Mean free path is very short compared to any scale of in- 
terest. The system is approaching the continuum limit with 
many collisions per particle in a time step. One remedy is 
to use a collision limiting scheme. Here, we use the Equi- 
librium Particle Simulation Method (EPSM) described in 
>i3.1.3l When using EPSM, the relevant velocity in equation 
((8)1 is the mean fiow velocity (v) not the mean ballistic ve- 
locity V. Of course, if this condition obtains everywhere in 
the computational volume, all cells will use EPSM and the 
resulting calculation will be in the CFD regime. 

• EA < l,e/i -0(1). 

Mean free path is very large compared to any scale of in- 
terest. The partial remedy is to increase I and possibly At 
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Figure 2. Sciiematic description of DSMC parameters on the 
e\-efi plane. The light green box shows the desirable range for 
the two parameters for efficiency. The dark green box shows the 
ideal range. If efi is too large, particles may cover multiple cells 
for ^A ^ 1 (red shaded region); the step size should be reduced. If 
it is too small, the particles may interact with particles that can 
not be reached by free flight in one time step. For €a ^ 0-1 (blue 
shaded region), the limiting EPSM algorithm is used. 



as long as I remains small than any scale of interest. Oth- 
erwise, the system is approaching the collisionless limit and 
all is well. 

This regimes are summarized schematically in Figure [2l 



3.1.3 Contmuum-DSMC Hybrids 

DSMC is computationally expensive, although much 
cheaper the molecular dynamics. The expense can be mit- 
igated by only using DSMC where it is needed! A num- 
ber of recent contributions to the literature describe hybrid 
Navier-Stokes-DSM C solvers usin g fi nite-el e ment methods 
and AMR techniques Garcia et al. fe.g. ll999r ): IWissink et al.l 
(e.g. 2001); Wijcsinghc ct al. (e.g. 2004j). Because of the un- 
usual geometries and large dynamic range in astrophysical 
flows, the implementation of the kinetic theory is greatly 
simplified by using a particle method throughout these sim- 
ulations rather than a hybrid DSMC-Navier-Stokes code. A 
pure particle code easily accommodates the arbitrary geom- 
etry of interfaces, simplifies the transition between regimes 
without the numerical artifacts owing to the statistical noise 
incurred by moving between the particle simulation and con- 
tinuum representation. However, in the same the high den- 
sity regions that are collision dominated and make DSMC 
infeasible, the system will a pproach th ermodynamic equi- 
librium. Based on this. Bird (|Birdll2007l ) suggested that the 
number of collisions per particle be limited to only the num- 
ber necessary to achieve equilibrium. 

A further simplification eliminates individual collision 
computations altogether when the density and collision rates 
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are high. If one can predict that the number of collisions 
will be sufficient to achieve equilibrium without simulat- 
ing the collisions, the limiting thermodynamic state for a 
collision-dominated cell may be computed a priori, and one 
may achieve a significant computational ad vantage . Thi s 
high-collision limit of DSMC was proposed bv lPullinI (|f980l ) 
who called it the Equilibrium Particle Simulation Method 
(EPSM). This app roach has been more recently explored by 
iMacrossanI (|2001h . In essence, EPSM selects new velocity 
components from the equilibrium thermodynamic distribu- 
tion within each cell while simultaneously conserving the 
total energy and momentum. An estimate of the tempera- 
ture, T may be derived from the total energy of the finite 
sample of ric particles in a cell using the standard relation 
for the mean energy of single degree of freedom edof = \kbT. 
Although PuUin's application was a single species in one di- 
mension, the approach is straightforwardly generalized to 
any number of species in three dimensions. We will com- 
bine the DSMC and EPSM collision computation approxi- 
mations using a heuristic such as the mean number of col- 
lisions between particles to switch between the two. More 
sophisticated collision limiting appr oaches with fewer limi- 
tations are under investigation (see lZhang et al.l 120081 ) and 
will be explored in the future. 

Owing to the random realization of evolved states in 
both DSMC and EPSM, the EPSM solution will differ from 
the DSMC solution and the exact molecular dynamics so- 
lution, even after many collisions. These differences lead to 
statistical scatter about the exact solution. Presumably, the 
scatter will be larger for DSMC than for EPSM, since ini- 
tially identical initial states will contain slightly different 
total amounts of energy and momentum. It might also be 
possible to improve the use of both algorithms together by 
conserving or matching incoming and outgoing fluxes at cell 
boundaries between the two regimes. 

EPSM is expected to be more efficient than DSMC close 
to thermodynamic equilibrium. However in DSMC, the time 
step is chosen to be of order the collision time. Obviously, 
this collision must be relaxed in EPSM, otherwise the mean 
number of collisions per particle would be C(l) and the dis- 
tribution could be far from equilibrium. In both schemes, 
large cell sizes 5x and step sizes Ai larger than the local col- 
lision time may be used in regions where the flow gradients 
are small. See i)3. 1.41 for additional complications. 



3.1.4 Effective mean free path and dissipation in DSMC 

To summarize, EPSM is an inflnite collision rate limit of 
DSMC with a finite sample of simulator particles. If the 
time step exceeds the time of flight across a typical cell, 
then particles with momentum and energy representative 
of the equilibrium distribution of one particular cell can be 
non-physically transported to a distant cell with possibly 
different equilibrium conditions. Close to equilibrium, this 
distance is approximately (fc_BT/m)^"A, where m is the 
particle mass. On the other hand, the distance covered by 
particles in Ai should be at least as large as the mean free 
path to prevent inducing non-physical correlations. If the 
criteria described in i]3.1.2l are close to ideal, the probability 
that particles will cross the cell in any one time step is small 
in the high-collision-rate limit. However, the ideal tuning 
parameters are not always computationally feasible. 



To understand the implications of these limits, let us 
consider transport of particles between two adjacent cells. 
Consider a Euclidean set of coordinate axes for a collision 
cell, {x, y, z) with x chosen perpendicular to the cell face 
common to the adjacent cells. Let this face be located at 
a:: = 0. Let the mass density of the two cells to the left 
and right of the boundary a; = be denoted by p^^ and 
p^", respectively. For notational simplicity, let factor the 
phase-space distribution function into a density and velocity 
distribution function: /(x, v) = p(x.)g{v); this is consistent 
with our discretization of the phase space into collision cells. 
The velocity distribution function describing the state in 
the adjacent cells will be different in general: g^'^ 7^ g''". 
That is, particles that cross the cell boundary with V:c > 
{vx < 0) have the parent distribution function g^" {g''"). 
This discontinuity gives rise to an effective viscosity. 

Demonstrating this explicitly is an easy ex ercise using 
the s tandard techniques from kinetic theory (e.g. ICercign anil 
ll99Cr ). Let q denote some velocity moment of g, such as the 
momentum. The net flux of the q across the interface, J^J, 
can be split into two contributions by the direction of the 



particles crossing the cell face, Tg 



t: 



ro 



<0 



p 



dvx 



dVn: 



dVy 



dvy 






as follows: 



dv^g Vxq{v), (9) 



dv^g^ V:cq{v). (10) 



In general, the gradient in velocity parallel to cell wall will 
not vanish across the interface: dvy/dx 7^ 0. Setting q — mvy 
and taking g^° and g''^ to be Maxwell-Boltzmann distribu- 
tions with the temperatures T^*^ and T-*" in the adjacent 
cells, the momentum fluxes parallel to the cell wall are 

-F<S„ = I [1 + erf [{v<')/^2k,T<ym)] p<'{v<'){v<°) 



+ 



27rm 



-<^<">'^/2fe,T<"/™ <0/ <0x 



(11) 



•^ rr. 



i [1 - erf [{v>'')/V2k,T>ym)] p>'\v>°){v>''>) 
_y^e-<''^>°>^/-^-"°/™p>°K>°> (12) 



(13) 



where 



dvx / dvy / dv.g<°vf (14) 

-00 J~oo J —00 

and analogously for {Vj"^ ). Assume that there are no addi- 
tional gradients for simplicity; this implies that T = T^ = 
T^" and p = p^° = p^". In addition, assume that the 
mean velocity is small compared to the thermal velocity, 
{vf ) ^ \/kbT/m. Then, the net momentum flux across 
the cell boundary becomes 



.JmVy — P 



2Tvm 



{(v>°)-{v>''))+0{(v,f). 



(15) 



Now, define the distance that the mean particle trans- 
ports its momentum as s. The value of s will depend both 
the details of DSMC code and the physics of the interactions 
and gives us an approximation to the gradient: 



dvy 
dx 



iv^) - «') 
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Combining these relations, we may express the net momen- 
tum flux as 



Trr^ 



kbT dvy 
2Tim dx 



(16) 



This is a shear stress induced at the ceU boundary by the 
discretization. 

Physically, viscosity arises from the shear stress at an 
interface that that opposes an applied force. The classic ex- 
ample is the laminar flow of a viscous fluid in the space be- 
tween two relatively moving parallel plates, known as Cou- 
ette flow. The force applied to the plates causes the fluid 
between the plates to shear with a velocity gradient in the 
direction of relative motion. In other words, the shear stress 
between layers is proportional to the velocity gradient in the 
direction perpendicular to the layers: 



dv 
dx 



(17) 



where fi is the proportionality factor called the viscosity. 

Comparing equation (I16|l that describes our discretiza- 
tion shear stress to the usual relation between mean free 
path and viscosity for planar Couette flow f eg. 1171) . we may 
estimate the effective viscosity for the simulation in the near 
equilibrium limit to be 



l^sim 



ksT 



(18) 



2Tvm 
The mean free path for a hard sphere with viscosity fi is 



l|Cercignanilll99Clf ). This implies that the effective mean free 
path due to the transport across the cell boundary in the 
simulation is then Xsim ~ s/2. 

The EPSM algorithm mixes the transported momentum 
into the entire cell of size I. Interestingly, this suggests that 
s — I and the effective mean free path in DSMC simulation 
becomes 



Xdsmc = max{//2. A} 



(20) 



where A is the local mean free path. This is equivalent to 
the DSMC requirement that the cell size should be smaller 
than the collisional mean free path for physical accuracy. 
However, when scale of all macroscopic observables is large 
compared to the mean free path, the less expensive EPSM 
solution should closely approximate the DSMC solution. In 
other words, DSMC and EPSM should give similar results 
when the number of collisions per particle per time step in 
DSMC is large, which justifles the use of cell sizes much 
larger than the mean free path and artiflcially limiting the 
number of collisions per particle in this limit. 

Similarly, the same arguments leading to equation (|16p 
suggest that if DSMC grid cells are too large {I ^ X), non- 
local particles will be selected for collisions resulting in the 
numerical diffusion of gradients on the cell-size scale. Con- 
versely, if DSMC cells are too small {I <^ X), but still contain 
the same number of particle per cell, the number of simu- 
lated particles becomes far more than necessary, resulting in 
an accurate but highly inefficient simulation. 

In some flow problems of astrophysical interest, it is not 
possible to use a sufficient number of particles to populate 



cells at the mean-free-path scale. In these cases, we resort 
to using an under-resolved collision-limited DSMC solution 
with large cells. As we have seen, the overall effect of such an 
approximation is to misrepresent the transport coefflcients 
giving rise to an effective viscosity. This should not lead to 
significant misestimations as long as the gradients of the ffow 
are small with length scales greater than the cell size. 

3.2 Implementation 

3.2.1 N-body code 

We have implemented the DSMC kinetic algorithm in our 
n-body expansion code, EXP. Owing to its long effective 
relaxation times, the expansion or "self-consistent field" al- 
gorithm (SCF, e.g. Hernquist & Ostriker 1992) is ideal for 
representing the large-scale global response of a galaxy to 
a long-term disturbance. Weinberg (1999) extended this al- 
gorithm from fixed analytic bases to adaptive bases using 
techniques from statistical density estimation to derive an 
optimal smoothing algorithm for SCF that in essence selects 
the minimum statistically significant length scale. This is 
combined with the empirically determined orthogonal func- 
tions that best represent the particle distribution and sepa- 
rate any correlated global patterns. The rapid convergence of 
the expansion series using this matched basis minimizes fluc- 
tuations in the gravitational force by reducing the number 
of degrees of freedom in the representation and by increas- 
ing the signal-to-noise ratio for those that do contribute. 
In addition, EXP contains a particle-in-cell Poisson solver 
as well as a parallelized direct-summation solver. The for- 
mer may be ideal for investigating the importance of small 
self-gravitating spatial inhomogeneities that are a small part 
larger astrophysical flow. 

The EXP code is modular with an object-oriented archi- 
tecture and easily incorporates DSMC. The standard SCF 
algorithm is "embarrassingly" parallel and the implemen- 
tation here uses the Mess age Passing Interface communi- 
cations package (MPI, e.g. ICropp et al.ll 19991 : ICabriel et al.l 
|2004,), making the code easily portable to a variety of par- 
allel systems. Practically speaking, parallel SCF with the 
new algorithm makes tractable disk and halo simulations 
on modest-sized PC-based clusters with up to 100 million 
particles. 

EXP uses a multiple time step algorithm as follows. We 
begin by partitioning phase space m + 1 ways such that each 
partition contains Uj particles that require a time step 



St J = 2~^ST, with j = 0, 



(21) 



where 5T is the largest time step. The time step with j = m 
corresponds to the smallest single time-step in the simula- 
tion. Since the total cost of a time step is proportional to 
the number of force evaluations, this algorithm improves the 
run time by the following factor 

m m 

S = ^n,2V^n,2" 



m mm 

^n,2--+V^n, = l^n,2 



J — m+l 



3=0 



3=0 



(22) 



3=0 



where A'' — X]"Lo ^J' ^^ ^^^ particles were the deepest level, 
j = m, we have S = 1. On the other hand, if most of the 
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particles are in the level j — 0, w e have S = 2"*. F or an 
c = 15 NFW dark-matter profile (jNavarro et al.lll997h with 
A'^ — lO'^ particles as an example, we find that m = 8 and 
S ~ 30, an enormous speed up! Forces in the SCF algorithm 
depend on the expansion coefficients and the leap frog algo- 
rithm requires interpolation of these coefficients to maintain 
second-order error accuracy per step. The contribution to 
each expansion coefficient for particles in time-step parti- 
tion j are separately accumulated and linearly interpolated 
for levels k > j as needed. Higher-order interpolation would 
have higher-order truncation error than the ordinary differ- 
ential equation solver and would be wasteful. 

EXP uses the minimum of three separate time step cri- 
teria for choosing the appropriate time-step partition j for 
each particle i: 



(i) A local force time scale: 



Ati = ei 



|V*J 



(23) 



This sets the time step to be a fraction ei of the estimated 
change of local velocity by the gravitational force; that this 
is, this time step is based on the rate of velocity change 
implied by the gravitational field. 
(ii) A local work time scale: 



Af2 = £2-; 



I*. 



V$> 



(24) 



This sets the time step to be a fraction £2 of the time required 
to change the particle's potential energy, 
(iii) And a local escape time scale: 



Af3 = £3 



I*. 



|V$J 



(25) 



This time step will be small for a fast moving particle near 
a local feature in the gravitational field or for a particle that 
samples a large range of gravitational field strengths over its 
orbit. 

We calibrate the leading coefficients et for accuracy using a 
test simulation for each new set of initial conditions. Typ- 
ically Efc ~ 0.01 yields better than 0.3% accuracy over a 
dynamical time. 



3. 2. 2 DSMC parallelization 

Our implementation uses an octree for partitioning space 
to make DSMC collisions cells. Each node in the octree is a 
rectangular prism most often chosen to be a cube. Each node 
is dividing into eight child nodes of equal volume formed by 
a single bisection in each dimension. Octrees are the three- 
dimensional analog of quadtrees. Unlike kd-trees, the octree 
partitioning is on volume alone, so that the aspect ratio of all 
nodes are self similar. This is desirable for collisions cells that 
assume that any particle in a cell whose linear dimension is 
approximately a mean free path is a good collision partner 
for another particle from that same cell. 

Most contemporary supercomputers are networked 
clusters with multiple multi-core processors per node. Each 
of the CPU cores share the memory and communication 
resources of the parent node, and each node is intercon- 
nected to the others through a fast low-latency network. 



To simplify the parallel domain decomposition and exploit 
the typical cluster topology, we use a two-level nested tree 
as follows. The first level is a coarse octree constructed with 
approximately an order of magnitude more cells than nodes. 
The computational work for all particles belonging to leaves 
of the coarse tree is accumulated and used to balance load 
among the nodes when the domain is decomposed. The sec- 
ond level constructs a tree in each leaf of the coarse first-level 
tree. Spatially contiguous leaves are preferentially assigned 
to the same node. The goal of the second- level tree are collec- 
tions of interaction cells that contain approximately ric — 10 
simulation particles. Aggregated cells or super cells used for 
statistical diagnostics contain A'^c > ?^c particles, typically 
Nc ~ lOric. To minimize the intranode communication over- 
head, we run one nmltithreaded process per node. Multi- 
threaded tasks include determining expansion coefficients, 
force evaluation, and DSMC collision computations. 

The full domain decomposition is computationally ex- 
pensive and would horribly dominate the run time if per- 
formed at the smallest time step: j = m in equation (I2ip . 
Rather, the tree is extended as necessary by updating the 
particle-node association and constructing new tree nodes as 
necessary for time intervals short compared to the evolution 
time scale but long compared Stm. The interim procedure 
updates the octree consistently but without exchanging par- 
ticles between processors. At some preselected intermediate 
time step assigned by choosing niadj with Sj madj ^ rn, the 
domain decomposition is reconstructed anew, exchanging 
particles between processors as necessary. In other words, 
at time steps with j ^ rUadj, the both the coarse- and fine- 
grained octrees are recomputed from scratch. At time steps 
with j > mad], the fine-grained octrees are updated only. It 
is therefore possible to have the same cell populated by dif- 
ferent particles on separate nodes for a short period of time 
during the simulation. This will introduce some errors since 
the densities in the duplicated cells will be underestimated. 
However, the value of rUadj is chosen so that Stadj remains 
smaller than any characteristic dynamical time except for 
the collision time, and therefore, I do not anticipate that 
this procedure will lead to significant quantitative or qual- 
itative errors. If there is any doubt, the value of madj rnay 
be increased to m as check. Along the same lines, DSMC 
and DSMC-EPSM explicitly conserve energy and momen- 
tum at the collision scale, including those describing atomic 
and molecular internal degrees of freedom. Therefore, even if 
small-scale features are mildly compromised by duplication 
errors, these are unlikely to propagate. 

3.2.3 Cooling and heating 

The Monte Carlo realization of the collision integral depends 
on the interaction cross section from equation © and defines 
the probability of internal excitation, ionization, recombina- 
tion, radiation or scattering. Since the spontaneous emission 
times are most often very small compared to the collision 
times for astrophysical fiows, therefore we do not need track 
the excitation state of the atoms and molecules between col- 
lisions. Thus, coUisional cooling is a straightforward natural 
by-product of DSMC. 

However, there are some intrinsic difficulties in do- 
ing this accurately that require some additional specialized 
methods. For example, the existence of trace ionized com- 
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ponents (e.g. the elements C, N, O) that have a significant 
effect on cooling but carry negligible momentum can not be 
simulated naively if there will be fewer than one trace-species 
particle per interaction cell. We elect therefore to change the 
ratio of the true particle number to simulation particle num- 
ber for trace species. Then, we populate the simulation with 
sufficient numbers of the trace species to yield accurate sta- 
tistical representation of the trace interactions with weights 
to account for their true mass fraction. 

A further issue is the treatment of free electrons. In 
principle, it is not difficult to implement free electrons as 
a separate species. However as a consequence, the large ve- 
locities of the electrons relative to the ions in equipartition 
require very small time steps and lead to infeasibly large 
computational expense. We will approach this problem in 
two incremental steps. In the first, we will assume that the 
electrons remain in the vicinity of their parent ions, both 
preventing charge separation and eliminating the need for 
time steps on order of the mean flight time of the simulated 
electrons. In the second, we will include the free electrons as 
a separate species to feel the electrostatic field induced by 
charge separation. Computational requirements will restrict 
this application to very small simulation regions and will be 
used, primarily, to explore mesoscale dynamics in various 
astrophysical regimes. 

Alternatively, one may account for the energetic affects 
of the trace species by using their reaction rates computed 
from the thermodynamic properties approximated from the 
DSMC fields. In particular, the electronic level populations 
due to coUisional and radiative excitation may be described 
by rate equations using the electron density, temperature 
and background radiation field. Even if the flow is far from 
equilibrium, as long as the processes that establish the elec- 
tronic level populations occur much more quickly than the 
time for any significant flow pattern to change, an equilib- 
rium distribution will be a fair approximation in many situ- 
ations. This approach is called the quasi static state (QSS) 
approximation. The electron densities, temperatures and ion 
densities may then be used to evaluate the QSS rates. The 
QSS rates can be obtained from individual particle cross sec- 
tions by assuming that the electron velocity distributions are 
Maxwell-Boltzmann. These same set of assumptions yields 
the standard cooling curves us ed in cosmo logical and ISM 
continuum gas simulations fe.g. lBlackll 19811) . Heating by cos- 
mic rays, photoelectric heating, etc. may be computed sim- 
ilarly using the QSS method. 

In this and Paper 2, we will employ such an approxi- 
mate QSS solution based on LTE. Then to treat the elec- 
trons, we estimate the ionization fraction based on the local 
thermal state and assume that the electrons follow the ions 
in space. The implementation here computes the effective 
temperature from the super cell of Nc particles (see ^3.2.2|) 
and estimates collision rates based on a total effective hard- 
sphere geometric cross section. 

3.2.4 EPSM implementation 

For very small mean free paths, our algorithm transitions to 
the Navier-Stokes equations using the equilibrium particle 
simulation through the use of EPSM ( i]3.1.3|) . If the number 
of mean collisions per body in a cell exceeds some preselected 
value, Ucoii The EPSM update is performed in one of two 



ways. First, using the original IPuUinl (|l98(l ) algorithm. This 
algorithm constructively computes random variates while si- 
multaneously enforcing the constraints of momentum and 
energy conservation, much the same way as in the proof for 
distribution of the cons istency of the sample variance (e.g. 
iKendall fc StuartlllQS S!). Alternatively, it is straightforward 
to realize normal variates with any convenient mean and 
variance, followed by a shift and scale operation to recover 
the conserved total momentum and energy. Both algorithms 
were implemented for completeness and yielded equivalent 
values but the latter is faster in tests and is the default. The 
use of EPSM may be toggled by a run-time parameter. In 
addition, if EPSM is not used, one may elect to limit the 
total number of collisions per cell to a maximum value as 
suggested bylBird (2007) while maintaining the correct heat- 
ing and cooling rates consistent with the predicted number 
of collisions. 



3.2.5 Run-time diagnostics 

The simulation particles directly represent the physical 
properties of the gas. They are not tracers of an underly- 
ing fleld but rather all momentum, kinetic energy, internal 
energy and chemical fluxes must be computed directly from 
the particle distribution. Therefore, the density, temperature 
and other traditional field quantities may only be estimated 
as an ensemble average. Owing to the typically small num- 
ber of particles, Uc, in an interaction cell, these estimates 
are computed using the super cells with Nc particles (see 
3332J- As a consequence, the estimates used for producing 
a field representation for the gas' physical state have lower 
spatial resolution than the simulation but are useful diagnos- 
tics nonetheless. In contrast, graphical representations from 
hydrodynamic simulations represent the full computed res- 
olution of the field quantities. Our current implementation 
computes ensemble temperature, density, Knudsen number, 
and cell size to flight length ratio. These quantities are car- 
ried by each gas simulation particle that are saved in phase- 
space output files. 

In addition, the EXP-DSMC module keeps copious di- 
agnostics on the particle time of flight distribution, the 
mean-free path, and energy radiation rates per cell to ensure 
that the conditions required for an accurate DSMC simula- 
tion are maintained (see i]3.1.2|l . The necessary time step 
required for the particles in each cell is fed back into the 
time step selection algorithm to adaptively change the stem 
steps for particles as described in i]3.2.1l 



4 TESTS 

4.1 Shock tube 



The Riemann shock tube (|Sodl 1 19781 ) is a special case of 
a Riemann problem in Eulerian hydrodynamics and is de- 
flned by an initial state with two fluids of different density 
and pressure divided by a planar interface. The Rankine- 
Hugoniot conditions allow one to compute the flow prop- 
erties across the shock. Using this, an exact solution may 
be obtained analytically for an adiabatic gas from the one- 
dimensional Euler equations written conservation form. Be- 
cause an exact solution is straightforwardly computed, the 
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Riemann shock tube has become a standard benchmark for 
computational fluid dynamics problems. The solution may 
be characterized, briefly, as follows. Assume that the gas 
with higher pressure and density is to the left of the inter- 
face, with the left to right direction denoted by x. The self- 
similar solution naturally defines five regions labeled here 
from 1-5 by increasing values of x. Regions 1 and 5 have 
states which correspond to the left and the right initial state 
respectively. Regions 3 and 4 have steady states solutions 
independent of x. Region 2 represents an expansion wave, 
also called a rarefaction wave, which moves to the left and 
varies with x. It is the only non-constant region in the solu- 
tion. The dividing line between Regions 3 and 4 is a contact 
discontinuity, i.e. a line separating two fluids of different en- 
tropy but the same pressure and the same velocity. It moves 
slowly to the right. Regions 4 and 5 are separated by a for- 
ward moving shock wave , moving quickly to the right. See 
iBodenheimer et al.l (|2006r ) for details. 

Here, we perform four tests for two different sets of 
initial conditions. The flrst set of initial conditions is the 
original Sod example with (p, P, v)l = (1, 1, 0), (p, P, v)r = 
(1/8, 1/10, 0) and the second set is a strong shock case with 
{p,P,v)l = (10, 100,0), (p,P,«)fl = (1,1,0) suggested by 
F. X. Timme^. For each initial condition, we tested a pure 
DSMC and the hybrid EPSM/DSMC with indistinguishable 
results. Figure [3] compares the results of the pure DSMC 
simulation and the exact Riemann solution for both cases 
l|Tord 11993 ). The remarkable reproduction of the analytic 
results by the DSMC and the DSMC-EPSM hybrid meth- 
ods is expected owing to the local nature of the momentum 
and energy transport in a kinetic simulation. Hydrodynam- 
ics solvers, on the other hand, often require globally coupled 
solutions of flux equations that can be numerically unstable. 



4.2 Kelvin Helmholtz instability 

The traditional Kelvin-Helmholtz (KH) instability is deflned 
for the flow of two incompressible inviscid fluids with an in- 
flnite plane-parallel interface. Each fluid has different bulk 
velocities vi and V2 parallel to the interface between the two 
fluids with densities pi and p2. We assume that vi > V2. 
For ease of discussion, we observe the fluid from the frame 
of reference moving at {vi +V2)/2; this implies that Fluid 1 
moves to the right with velocity vi — V2 and Fluid 2 moves to 
the left with velocity V2—V\. Exactly at the interface, there 
is positive nearly uniform vorticity but the vorticity is zero 
on either side of the interface. With this initial condition, 
an external sinusoidal perturbation causes a growing insta- 
bility as follows. Consider a sinusoidal perturbation to the 
interface. The pressure increases in the concave regions and 
decreases in convex regions of the interface (i.e. the Bernoulli 
principle) which allows the perturbation to grow. The peak 
of the vortex sheet is carried forward by Fluid 1 and the 
trough is carried backward by Fluid 2. This causes the ini- 
tially sinusoidal corrugation interface to stretch, tighten and 
eventually roll up with the same sense as the vorticity at the 
original interface. 

In Nature, such interfaces abound and the KH instabil- 
ity is thought to be critical for understanding a wide vari- 
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Figure 3. Shock tube test for the classic Sod initial conditions 
and a harder, strong shock (as labeled). The density and pressure 
axis are shown at the left and right, respectively. The Regions 1—5 
described in the text are easily identified. The key identifies the 
numerical DSMC solutions and the exact Riemann solver results 
for both the original and strong-shock initial conditions. In both 
cases, the agreement is excellent. 



ety of astrophysical phenomena. For example, the non-linear 
development of the KH instability leads to turbulent mix- 
ing in a free shear layer. These same interfaces are critical 
in understanding the developmen t of j e ts in ra dio galax- 
ies and quasars ([ B egelm an ct al.] Il980l : ISirkins haw 199jJ; 
IPerucho fc Lobanwil2007l ). 

Figure |4] shows the non-linear development of a KH in- 
stability for a fluid box with periodic boundary conditions 
in the x and z directions and reflecting boundary conditions 
in the y direction. The simulation has lO'' particles and unit 
dimensions and temperatures T\ = 5000 K and T2 — 10000 
K. The density ratio of the shearing fluids is 2 with pres- 
sure equilibrium at the boundary, Tipi = T2p2- The rela- 
tive fluid velocity has Mach number 1/2. The disturbance 
is seeded with a transverse velocity amplitude of 1/4 the 
shear velocity and spatial frequency of 1/3 according to the 



© 2013 RAS, MNRAS OOO.ITHTSl 



Direct Simulation Monte Carlo 11 



Kelvin-Helmholtz dispersion relation (I Chand rasekharl 196 If ) . 
The instability develops and evolves as expected, consistent 
with the spatial frequency and velocity of the linear mode. 
The figure illustrates that the interface is well-maintained 
throughout the evolution. Of course, the particle numbers 
place a limitation on the maximum spatial frequency that 
may be resolved owing to the DSMC requirements on cell 



5 DISCUSSION AND SUMMARY 

Nearly all gas dynamical simulations on galaxy scales or 
larger are numerical solutions of the Navier-Stokes equa- 
tions (also known as CFD). Computational expediency has 
motivated the use of CFD even when the mean free path of 
particles becomes appreciable to the scales of interest, such 
as in the early phases of galaxy cluster formation or in the 
coronal-neutral gas interface at large galactocentric radii. 
In the transition between these regimes and in the rarefied 
regime where the mean free paths are of order or larger than 
scales of interest, the standard fluid approximation breaks 
down and one must solve the Boltzmann equation with colli- 
sions using the kinetic theory of gases in order to understand 
the true nature of the flow. 

Moreover, astrophysical flows are rife with multiphase 
interfaces and shocks. Shock interfaces are discontinuities 
in the fluid limit and there are many accurate and elabo- 
rate schemes for their numerical computation in the CFD 
pantheon. However, in many cases of interest, the dynamics 
of the particles in this interface regime is critical to under- 
standing the overall energetics through cooling and heat- 
ing and the observational signatures in the form of line and 
continuum strength predictions and are unlikely to be well- 
described by the LTE approximation. Such calculations also 
require a kinetic theory approach. 

This paper describes an implementation of a Monte 
Carlo solution to the coUisional Boltzmann equation known 
as Direct Simulation Monte Carlo (DSMC). Algorithmically, 
it splits the full Boltzmann equation into a purely collision- 
less left-hand side and a purely collisional right-hand side 
and solves the two parts sequentially. The solution of the 
former is solved is provided by a standard n-body proce- 
dure. The solution of the latter uses a space partition to 
define interaction domains for the simulated gas particles 
that are used to evaluate the Boltzmann collision integral 
(see i]3.1.ip . The implementation described here uses a dou- 
bly nested octree for decomposing the spatial domain; the 
first-level coarse-grained tree is used for load balancing and 
the second-level fine-grained tree is used by each process to 
construct interaction domains (see i)3.2.2|l . This approach 
is much faster than the brute-force approach of molecular 
dynamics but still much slower than CFD. 

Al though DSMC will wo rk even in the limit of dense 
gases ([Alexander et al.l 1 19951 ). we would like to use the 
full kinetic approach only when needed owing to its com- 
putational expense. A number of s uch approach e s hav e 
been proposed in the literature, e.g. Garcia et al.l l|l999l ): 
IWissink et ai] l|200ll '): IWiiesinghe et al] (|2004 ) present adap- 
tive mesh refinement schemes that use the Navier-Stokes 
equations or DSMC depending on the regime. The imple- 
mentation in this paper also takes a multiscale adaptive 



approach based on the local density of particles: when the 
mean free path becomes small co mpared the d e nsity scale, 
we u se a particle-only approach IjPullinl Il980l : iMacrossanI 
I2OOH ) which solves the Navier-Stokes equations in the limit 
of large particle number (see i]3.1.3|) . Furthermore, although 
a bit noisier and possibly slower than the CFD-particle hy- 
brid approaches, a particle-only scheme is straightforwardly 
parallelized and mated with a traditional n-body code. Al- 
though we have thrown away the large-scale averaging im- 
plicit in the continuum hydrodynamic equations by adopting 
a particle-only approach, we have gained a method that is 
explicitly stable (i.e. no Courant-Friedrichs-Lewy condition) 
and 100% shock resolving. 

We have seen in ij?] that this code correctly repro- 
duces the standard shock-tube tests and develops Kelvin- 
Helmholtz instabilities that follow the analytic dispersion 
relation, both in the continuum limit. This approach does 
require an order of magnitude (at least) more fluid parti- 
cles in the continuum limit, although has the advantage of 
consistently transitioning to dilute gases, correctly resolving 
shocks, and resolving phase boundaries. 

The tests in Paper 2 use the classic fixed-composition 
cooling curves in the LTE limit both to facilitate compar- 
ison with published results and for the ease of implemen- 
tation. However, the traditional DSMC implementation is 
based on individual particle cross sections. It is natural and 
straightforward to include multiple distinct species, inter- 
actions, and excitations within the DSMC framework and 
self-consistently compute the radiation spectrum from the 
gas in the optically thin limit. When the heating or cooling 
of the gas overall depends on specific elemental or molec- 
ular lines from species with low fractional number density 
(e.g. singly or multiply ionized oxygen, carbon and nitrogen 
with T > 10* K), these species may be included as tracer 
subspecies by solving rate equations in the QSS limit or 
using weighting schemes with or without particle produc- 
tion. As described in HI. H and N3.2.3I we are currently test- 
ing a DSMC implementation that includes any specie whose 
atomic data is included in the CHI ANTI atomic data base 
(|Dere et al.lll997l : lLandi et al.ll201ll ). including the standard 
plasma cross sections. Continuing to generalize the micro- 
physics, it should be possible to consistently include addi- 
tional plasma physics by adding the simultaneous solution 
of the electrostatic Poisson equation. Of course, the time- 
dependent solution of Maxwell's equations is a very stiff 
challenging problem, but intermediate charge-flow problems 
are tractable using DSMC. 

Paper 2 applies the hybrid DSMC-n-body code from 
this paper to study the effect of an ICM wind on a galaxy's 
ISM, commonly known as ram pressure. As illustrated by 
Paper 2, DSMC may help us understand the multiphase 
medium on small scales by enabling accurate treatment of 
interfaces in the ISM. Finally, on much larger scales, DSMC 
can be used to simulate the dominant processes in intra- 
cluster gas dynamics, such as the formation and interaction 
of bubbles, conduction at interfaces, etc. 
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Figure 4. A density slice in the non-linear development of a Kelvin-Helmholtz instability seeded with a wave with spatial frequency if 
1/3. 



REFERENCES 

Alexander F. J., Garcia A. L., Alder B. J., 1995, Phys. Rev. 
Lett., 74, 5212 

Begelman M. C, Blandford R. D., Rees M. J., 1980, Na- 
ture, 287, 301 

Bird G., 2007, in Notes from DSMC07 meeting, Santa Fe, 
September Sophisticated DSMC 

Bird G. A., 1994, Molecular gas dynamics and the direct 
simulation of gas flows. Clarendon Press 

Birkinshaw M., 1991, MNRAS, 252, 73 

Black J. H., 1981, MNRAS, 197, 553 

Bodenheimer P., Laughlin G. P., Rozyczka M., Yorke 
H. W., 2006, Numerical Methods in Astrophysics: An In- 
troduction. Series in Astronoiny and Astrophysics, Taylor 
& Francis 

Boyd I. D., Chen G., Candler G. V., 1995, Physics of Fluids, 
7, 210 

Cercignani C., 1990, Mathematical Methods in Kinetic 
Theory, second edn. Springer 

Cercignani C, 2000, Rarefied gas dynamics. From ba- 
sic concepts to actual calculations. Cambridge University 
Press 

Chandrasekhar S., 1961, Hydrodynamic and Hydromag- 
netic Stability. Oxford Univesity Press 

Dere K., Landi E., Mason H., Monsignori Fossi B., Young 
P., 1997, Astronomy and Astrophysics Supplement Series, 
125, 149 

Fritz J., 2001, Lectures in Mathematical Sciences, 18 

Gabriel E., Fagg G. E., Bosilca G., Angskun T., Dongarra 



J. J., Squyres J. M., Sahay V., Kambadur P., Barrett 
B., Lumsdaine A., et al., 2004, in , Recent Advances in 
Parallel Virtual Machine and Message Passing Interface. 
Springer, pp 97-104 

Garcia A. L., Bell J. B., Crutchfield W. Y., Alder B. J., 
1999, Journal of computational Physics, 154, 134 

Garcia A. L., Wagner W., 2000, Journal of Statistical 
Physics, 101, 1065 

Gropp W., Lusk E., Thakur R., 1999, Using MPI-2: Ad- 
vanced features of the message passing interface. Vol. 2, 
MIT Press (MA) 

Hernquist L., Ostriker J. P., 1992, ApJ, 386, 375 

Kendall M. G., Stuart A., 1983, The advanced theory of 
statistics, 4 edn. Vol. 1, C. Griffin 

Landi E., Del Zanna G., Young P., Dere K., Mason H., 
2011, The Astrophysical Journal, 744, 99 

Macrossan M. N., 2001, in 22nd International Symposium 
on Rarefied Gas Dynamics Vol. 585 of AIP Conference 
Proceedings, A particle-only hybrid simulation method for 
near continuum flow, pp 388-395 

Mo H., van den Bosch F., White S., 2010, Galaxy Forma- 
tion and Evolution. Cambridge University Press 

Montanero J. M., Santos A., 1996, Phys. Rev. E, 54, 438 

Navarro J. F., Frenk C. S., White S. D. M., 1997, ApJ, 490, 
493 

Perucho M., Lobanov A. P., 2007, a, 469, 23 

PuUin D. I., 1980, J. Comput. Phys., p. 231 

Serikov V. V., Kawamoto S., Nanbu K., 1999, Plasma Sci- 
ence, IEEE Transactions on, 27, 1389 

Sod G. A., 1978, J. Comput. Phys., 27, 1 



© 2013 RAS, MNRAS OOO.nHTHI 



Direct Simulation Monte Carlo 13 



Sun Z.-X., Tang Z., He Y.-L., Tao W.-Q., 2011, Computers 
& Fluids, 50, 1 

Toro E. F., 1999, Riemann Solvers and Numerical Methods 
for Fluid Dynamics: A Practical Introduction, 2nd edn. 
Springer 

Wagner W., 1992, J. Statist. Phys., 66, 1011 

Weinberg M. D., 1999, AJ, 117, 629 

Weinberg M. D., 2013, MNRAS, submitted 

Wijesinghe H. S., Hornung R. D., Garcia A. L., Hadjicon- 
stantinou N. G., 2004, J. Fluids Eng., 126, 768 

Wissink A. M., Hornung R. D., Kohn S. R., Smith S. S., El- 
liott N., 2001, in Supercomputing, ACM/IEEE 2001 Con- 
ference Large scale parallel structured amr calculations 
using the samrai framework, pp 22-22 

Zhang R., Yao W., Li J., 2008, Communications in Non- 
linear Science and Numerical Simulation, 13, 2203 



© 2013 RAS, MNRAS OOO.IlHTSl 



